One-Class SVMs for Document Classification
نویسندگان
چکیده
We implemented versions of the SVM appropriate for one-class classification in the context of information retrieval. The experiments were conducted on the standard Reuters data set. For the SVM implementation we used both a version of Schölkopf et al. and a somewhat different version of one-class SVM based on identifying “outlier” data as representative of the second-class. We report on experiments with different kernels for both of these implementations and with different representations of the data, including binary vectors, tf-idf representation and a modification called “Hadamard” representation. Then we compared it with one-class versions of the algorithms prototype (Rocchio), nearest neighbor, naive Bayes, and finally a natural one-class neural network classification method based on “bottleneck” compression generated filters. The SVM approach as represented by Schölkopf was superior to all the methods except the neural network one, where it was, although occasionally worse, essentially comparable. However, the SVM methods turned out to be quite sensitive to the choice of representation and kernel in ways which are not well understood; therefore, for the time being leaving the neural network approach as the most robust.
منابع مشابه
Efficient Text Categorization Using a Min-Max Modular Support Vector Machine
The min-max modular support vector machine (M-SVM) has been proposed for solving large-scale and complex multiclass classification problems. In this paper, we apply the M-SVM to multilabel text categorization and introduce two task decomposition strategies into M-SVMs. A multilabel classification task can be split up into a set of two-class classification tasks. These two-class tasks are to dis...
متن کاملSupport Vector Machines for Multi-class Classification
A b s t r a c t : Support vector machines (SVMs) are primarily designed for 2-class classification problems. Although in several papers it is mentioned that the combination of K SVMs can be used to solve a K-class classification problem, such a procedure requires some care. In this paper, the scaling problem of different SVMs is highlighted. Various normalization methods are proposed to cope wi...
متن کاملChinese Question Classification Using Alternating and Iterative One-against-One Algorithm
Question classification plays a crucial important role in the question answering system because categorizing a given question is beneficial to identify an answer in the documents. The goal of question classification is to accurately assign labels to question based on expected answer type. Support vector machines (SVMs) have been proved as an excellent tool for machine learning, which were origi...
متن کاملA comparative study of performance of K-nearest neighbors and support vector machines for classification of groundwater
The aim of this work is to examine the feasibilities of the support vector machines (SVMs) and K-nearest neighbor (K-NN) classifier methods for the classification of an aquifer in the Khuzestan Province, Iran. For this purpose, 17 groundwater quality variables including EC, TDS, turbidity, pH, total hardness, Ca, Mg, total alkalinity, sulfate, nitrate, nitrite, fluoride, phosphate, Fe, Mn, Cu, ...
متن کاملDistributed optimization of multi-class SVMs
Training of one-vs.-rest SVMs can be parallelized over the number of classes in a straight forward way. Given enough computational resources, one-vs.-rest SVMs can thus be trained on data involving a large number of classes. The same cannot be stated, however, for the so-called all-in-one SVMs, which require solving a quadratic program of size quadratically in the number of classes. We develop ...
متن کاملMulticlass Approaches for Support Vector Machine Based Land Cover Classification
SVMs were initially developed to perform binary classification; though, applications of binary classification are very limited. Most of the practical applications involve multiclass classification, especially in remote sensing land cover classification. A number of methods have been proposed to implement SVMs to produce multiclass classification. A number of methods to generate multiclass SVMs ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Journal of Machine Learning Research
دوره 2 شماره
صفحات -
تاریخ انتشار 2001